Results of initial analysis for Tursiops RADseq analysis from the Gulf and Atlantic. Overall goals are to understand population structure, look for evidence of hybridization, and determine range extent of the species.

  • Initial number of samples: 375
  • Samples dropped due to sequencing quality/coverage: 3 + 24
  • Samples dropped because they’re strandings: 11
  • 337 samples in the data
  • Number of SNPs after filtering:7577
  • Number of SNPs after LD thinning:4356

Link to scripts on github

See map of sample locations below.

Map of sample locations
Map of sample locations

Population structure

PCA

First, just run a basic PCA with all samples, colored by region:


PCA colored by depth

PCA colored by distance from shore

We get clustering by shallow and deep/ nearshore and offshore, rather than Atlantic vs. Gulf. Suggesting that nearshore in the Atlantic and Gulf are more similar that geographic promixmate animals nearshore and offshore.


Checking for an inversion

The patterns above maybe look like an inversion. If we saw this, we’d see the PCA loadings clustered in a single location in the genome, but the PCA loadings don’t indicate this, instead we see the loadings distributed across the genome.




Admixture

Running multiple K’s with the cross-validation error for each. Lowest error indicates the most likely K. It is basically withholding a subset of the genotypes then predicting their values and compares this to the withheld data.

Sorting the plot below by Atlantic on the left, Gulf on the right. Then shallow to deep by collection location

Note that the left group in the Atlantic are the nearshore individuals, the bottom left in the PCAs above.

ordered shallow to deep, following subset for unequal sample sizes:


PCA with populations from Admixture and DAPC

NJ tree of pairwise genetic distances



Map of population assignments

Population assignment conclusion

  • Both Admixture and DAPC (not shown) indicate 4 populations as the most likely. They also agree with their individual assignments for all but ~5 individuals, who are all fairly admixed.
  • The four popoulations are:
    • Coastal Gulf
    • Coastal Atlantic
    • Intermediate
    • Offshore

For the rest of the analyses, I’ll run both the four population analysis as well as a six population analysis, splitting the intermediate and offshore into Gulf and Atlantic. I think this makes sense biologically and is justified given our hypotheses going into the analysis.



Isolation by distance:

Next, testing if there is isolation by distance at various scales. For all I’m using PCA-based genetic distance with 64 PCs (based on Shirk et al.), but these values are very similar to Plink and Euclidian values.

IDB across all individuals

Across all individuals, there is an IBD signal. But I think this is likely driven by underlying comparisons.


IBD: Four population analysis

IBD: Six population analysis

Split this out into each individual population across both regions. Within a population we generally see IBD, except for Atlantic Offshore. Inter-population comparisons have no signal.


IBD: Atlantic and Gulf

IBD: Combining Atlantic and Gulf

IBD: Comparing only between populations


IBD conclusion

  • strong signal of IBD within both the four and six populations.
  • Between the Gulf and Atlantic, there is relatively minimal IBD.



Pairwise Fst

Above diagonal, Fst, below, p-values from 500 bootstrap permutations.



Introgression and hybrids

The Intermediate individuals are possibly hybrids, given the results above they’re intermediate in genetic distance for nearly all analyses. Here, I’ll test this more explicitly using:

  1. f3 statistics
  2. D statistics
  3. treemix
  4. triangle plots
  5. new hybrids

f3 statistics:

The f3-statistic explicitly tests whether a taxon of interest results from admixture between two others: A significantly negative f3-statistic supports the admixture hypothesis, while a positive value is not informative. In our case, our taxon of interest (pop1) is Intermediate while pop1 and pop2 are Coastal and Offshore.

First, I calculated these statistics with the four populations assignments:

F3-Statistics with Four Populations
pop1 pop2 pop3 est se z p
Intermediate Coastal_Atlantic Offshore 0.0022091 0.0006265 3.526311 0.0004214
Intermediate Coastal_Gulf Offshore -0.0024130 0.0005807 -4.155046 0.0000325

Remember, positive values are not informative, negative values indicate a population resulting from admixture. No significance here.

Next, I split into 6 populations:

F3-Statistics with Six Populations
pop1 pop2 pop3 est se z p
Intermediate Atlantic
Intermediate_Atlantic Coastal_Atlantic Offshore_Atlantic 0.0045946 0.0007621 6.028723 0.0000000
Intermediate_Atlantic Coastal_Atlantic Offshore_Gulf 0.0046803 0.0006610 7.080223 0.0000000
Intermediate_Atlantic Coastal_Gulf Offshore_Atlantic -0.0008044 0.0006919 -1.162569 0.2450043
Intermediate_Atlantic Coastal_Gulf Offshore_Gulf 0.0007986 0.0006022 1.326149 0.1847903
Intermediate Gulf
Intermediate_Gulf Coastal_Atlantic Offshore_Atlantic -0.0013517 0.0006981 -1.936443 0.0528134
Intermediate_Gulf Coastal_Atlantic Offshore_Gulf -0.0012883 0.0005956 -2.163242 0.0305225
Intermediate_Gulf Coastal_Gulf Offshore_Atlantic -0.0069896 0.0006486 -10.776006 0.0000000
Intermediate_Gulf Coastal_Gulf Offshore_Gulf -0.0054088 0.0005587 -9.681816 0.0000000

This indicates that the intermediate Gulf population is a result of admixture between the coastal gulf and both the Offshore Atlantic and Gulf populations. There is no evidence in the Atlantic Intermediate population.



D-/f4 statistics:

D-statistics, or ABBA-BABA tests, test for introgression by looking for deviations from incomplete lineage sorting. In short, if we have a tree with an ancestral “A” allele and derived “B” allele in the tree (((P1,P2),P3),O) where O is the outgroup, we should see an “ABBA” or “BABA” pattern at equal frequencies when there is incomplete lineage sorting and no gene flow. If there is an over representation of either ABBA or BABA, this suggests gene flow (see figure below, from the Dsuite tutorial).

Example of ABBA BABA test

I ran this test with Dsuite, with Aduncus as the outgroup. For the output below, P1 and P2 will always be arranged so that D is positive and indicates geneflow between P2 and P3. P1 and P2 could be flipped which would just flip the sign of D to negative and indicate gene flow between P1 and P3.

D-Statistics with four Populations
P1 P2 P3 Dstatistic Z.score p.value BBAA ABBA BABA p.value_multTesting
Coastal_Atlantic Coastal_Gulf Intermediate 0.1095670 8.98576 0.0000000 150.116 127.2450 102.1150 0.0000000
Coastal_Atlantic Coastal_Gulf Offshore 0.0741165 5.38575 0.0000001 222.785 99.9134 86.1249 0.0000003
Intermediate Coastal_Atlantic Offshore 0.0224866 1.33676 0.1813020 193.663 105.0040 100.3850 0.7252080
Intermediate Coastal_Gulf Offshore 0.0938877 6.04430 0.0000000 207.231 107.2300 88.8233 0.0000000
  • There is the strongest evidence for gene flow between the coastal gulf and intermediate population (line 1).
  • There is also gene flow between Coastal Gulf and Offshore
D-Statistics with six Populations
P1 P2 P3 Dstatistic Z.score p.value BBAA ABBA BABA p.value_multTesting
Coastal_Atlantic Coastal_Gulf Intermediate_Atlantic 0.1091260 9.079790 0.0000000 148.899 127.3860 102.3200 0.0000000
Coastal_Atlantic Coastal_Gulf Intermediate_Gulf 0.1106680 8.425950 0.0000000 153.683 126.8270 101.5530 0.0000000
Coastal_Atlantic Coastal_Gulf Offshore_Atlantic 0.0669190 4.494920 0.0000070 233.537 95.5616 83.5740 0.0001392
Coastal_Atlantic Coastal_Gulf Offshore_Gulf 0.0797059 6.054830 0.0000000 213.896 103.4710 88.1945 0.0000000
Intermediate_Atlantic Coastal_Atlantic Intermediate_Gulf 0.0157025 1.093010 0.2743910 132.005 126.4550 122.5450 1.0000000
Intermediate_Atlantic Coastal_Atlantic Offshore_Atlantic 0.0194407 1.044180 0.2964030 204.033 100.5820 96.7456 1.0000000
Intermediate_Atlantic Coastal_Atlantic Offshore_Gulf 0.0279045 1.689720 0.0910813 187.623 108.5010 102.6100 1.0000000
Intermediate_Gulf Coastal_Atlantic Offshore_Atlantic 0.0120134 0.620274 0.5350770 198.652 100.7160 98.3249 1.0000000
Intermediate_Gulf Coastal_Atlantic Offshore_Gulf 0.0198127 1.171660 0.2413330 182.099 108.5270 104.3100 1.0000000
Offshore_Gulf Coastal_Atlantic Offshore_Atlantic 0.0311905 1.332680 0.1826370 136.978 112.7660 105.9440 1.0000000
Intermediate_Atlantic Coastal_Gulf Intermediate_Gulf 0.1237390 9.011840 0.0000000 137.862 132.5190 103.3350 0.0000000
Intermediate_Atlantic Coastal_Gulf Offshore_Atlantic 0.0838444 4.753020 0.0000020 218.818 102.2760 86.4520 0.0000401
Intermediate_Atlantic Coastal_Gulf Offshore_Gulf 0.1049880 6.465220 0.0000000 200.306 111.3950 90.2267 0.0000000
Intermediate_Gulf Coastal_Gulf Offshore_Atlantic 0.0758475 4.790090 0.0000017 213.209 101.9770 87.5979 0.0000333
Intermediate_Gulf Coastal_Gulf Offshore_Gulf 0.0963299 6.936480 0.0000000 194.498 110.9290 91.4356 0.0000000
Offshore_Gulf Coastal_Gulf Offshore_Atlantic 0.0877368 3.887960 0.0001011 144.113 116.5960 97.7866 0.0020218
Intermediate_Atlantic Intermediate_Gulf Offshore_Atlantic 0.0074459 0.629808 0.5288200 194.231 97.7570 96.3120 1.0000000
Intermediate_Atlantic Intermediate_Gulf Offshore_Gulf 0.0080867 0.717387 0.4731350 178.223 104.3450 102.6710 1.0000000
Offshore_Gulf Intermediate_Atlantic Offshore_Atlantic 0.0137456 0.843210 0.3991110 132.338 110.0900 107.1050 1.0000000
Offshore_Gulf Intermediate_Gulf Offshore_Atlantic 0.0208236 1.209310 0.2265440 131.112 108.5970 104.1660 1.0000000
  • There is geneflow between both Intermediate Atlantic and Intermediate Gulf with Coastal Gulf.
  • Gene flow between Coastal gulf and both offshore populations
  • No signal for intermediate and offshore populations



Treemix

Maximum likelihood tree estimating drift among populations. Migration edges are fit to the tree to improve populations that are a poor fit to the model. Migration gets addes stepwise. You can estimate the number of migration events that improves the model fit best, similar to structure evanno type approaches.

no migration

First fit the trees with 0 migration events:

Four populations, no migration eventsSix populations, no migration events

Adding migration events:

Best number of migrations events:

  • Four populations: 2 migrations events
  • Six populations: 4 migration events

The four population result is consistent and clear. There are two migration edges, between intermediate and offshore and the node of intermediate/Coastal Gulf and offshore. This tree is well supported and consistent across runs (100 runs, nearly all show this exact tree, below).

In contrast, with 6 populations, things are much more uncertain/unstable. For the most likely tree (top left, in figure below) there are migration edges between the intermediate pops and the branch leading to coastal populations. There is also migration from the node of coastal Gulf/intermediate with both offshore populations. The next 5 most likely trees show similar variations on these migration events. Note that the position of the coastal and intermediate populations are unstable across runs. This maybe isn’t shocking given that these populations aren’t well supported in the other analyses.

Results that are consistent:

  • Offshore Gulf and Offshore Atl are always sister
  • There is no migration with the coastal Atlantic population.
  • there is ample migration between Intermediate and offshore populations.

Triangle plots

The basic idea behind these is that we can idenfity early generation hybrids by both their ancestry and (interclass) heterozygosity. We consider highly divergent differences (> 0.7 frequency; 0.8 snf 0.9 give similar results) between parental populations Ancestry Informative Markers (AIMs). Then an F1 hybrid would have a hybrid index based on these AIMs of 0.5 (50% of alleles from either parent population). We calculate how many of AIMs in the putative hybrids are heterozygous for ancestry from either parent. For an F1, all loci would be heterozygous, so this value would be 1. With F2, this heterozygosity would drop to ~0.5. and continue to drop if there is backcrossing.

Here’s a nice paper that shows expectations for different scenarios. In short, if we follow the expectation curve in the plot below, it is likely due to admixture and not isolation by distance or a similar process. In contrast, there should be no relationship between hybrid index and heterozygosity when admixture has not occurred and IBD is the main feature of the data https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.14039

Four population assignment
Four population assignment



Six population assignment
Six population assignment

with putative hybrids from PCA

The “hybrids” below are those from the PCA that appear to be a separate cluster in the PCA for the offshore gulf. Note they do look to be hybrids of some sort.

Six population assignment
Six population assignment



There are no F1 hybrids in these data, but the rest of the variation is likely due to admixture, not neutral IDB. There could be some recent-ish hybrids, but a little unclear from these results. Maybe new hybrids could help clarify.







still to do

  1. new hybrids
  2. Population structure within species
  3. Genomic cline analysis
  • using introgress, bgc, or similar (i.e., Gompert).
  1. gene environment associations
  2. general selection scans within species.

conclusions

as of 2025-03-26

  • Erebennus range includes both the Atlantic and the Gulf. This is a single species.
  • Truncatus range is both the Atlantic and the Gulf. This is a single species.
  • There is substantial and significant divergence between the Gulf and the Atlantic for both of these species. There is more divergence between Erebennus in the Gulf vs Atantic than for Truncatus, which makes sense.
  • The intermediate population is a result of hybridization between Erebennus and Truncatus.
    • There is minimal structure between the Gulf and the Atlantic.
    • There has likely been lots of backcrossing between the Erebennus and the Intermediate populations. Particularly in the Gulf.
    • The intermediate populations are likely Erebennus, potentially a subspecies.

outstanding questions for discussion

  • Stranded individuals were excluded. Should we reconsider this? There are only 11.
  • Do we agree that there is sufficient evidence that erebennus extends to the Gulf?
  • What should we consider the intermediate population.
  • Additional analyses to consider?
  • Changes to existing analyses?